This is an introduction of Monte Carlo Methods

1. Gibbs Sampling

(1) Bayesian Simple Regression (Using index.txt)

\[\begin{aligned}(y_i|\boldsymbol{\beta} ) &\sim \mathcal N(\beta_0+\beta_1x_i, \sigma^2),\quad i=1,\dots n\\ f(\beta_j) &\sim \mathcal N(0, 2.5^2),\quad j=0,1 \\ f(\sigma^2) &\sim \text{IG}(a,b),\quad a=b=0.001 \end{aligned}\]

1) posterior of \(\beta_0, \beta_1, \sigma^2\)

The posterior of \(\boldsymbol{\beta}=(\beta_0,\beta_1)^T, \sigma^2\) can be obtained as follows: \[\begin{aligned} f(\boldsymbol{\beta}, \sigma^2) &\propto f(y|\boldsymbol{\beta}, \sigma^2)f(\boldsymbol{\beta},\sigma^2) \\ &= f(y|\boldsymbol{\beta}, \sigma^2)f(\boldsymbol{\beta})f(\sigma^2) \\ &\propto \prod_{i=1}^n\frac{1}{(\sigma^2)^{1/2}}\exp\biggl\{-\frac{1}{2\sigma^2}(y_i - \beta_0-\beta_1x_i)^2 \biggr\} \cdot \prod_{j=0}^1\exp\biggl\{-\frac{1}{2\cdot2.5^2}\beta_j^2 \biggr\} \cdot \text{IG}(a,b) \end{aligned}\] For simplicity, we denote \(p(\beta_0|-)\), for example, as a posterior of \(\beta_0\) given rest of the parameters and data.

\[\begin{aligned} f(\beta_0|-) &\propto \prod_{i=1}^n\exp\biggl\{-\frac{1}{2\sigma^2}(y_i - \beta_0-\beta_1x_i)^2 \biggr\} \cdot \exp\biggl\{-\frac{1}{2\cdot2.5^2}\beta_0^2\biggr\} \\ &\propto \prod_{i=1}^n\exp\biggl\{-\frac{1}{2\sigma^2}(\beta_0^2 -2y_i\beta_0+2\beta_1x_i\cdot\beta_0) \biggr\}\exp\biggl\{-\frac{1}{2\cdot2.5^2}\beta_0^2\biggr\} \\ &=\exp\biggl\{-\frac{1}{2\sigma^2}\Bigl(n\beta_0^2 -2\sum_{i=1}^n(y_i - \beta_1x_i)\beta_0 \Bigr) -\frac{1}{2\cdot2.5^2}\beta_0^2 \biggr\} \\ &= \exp\biggl\{-\frac{1}{2}\biggl( \Bigl(\frac{n}{\sigma^2}+\frac{1}{2.5^2} \Bigr)\beta_0^2 - 2\sum_{i=1}^n(y_i-\beta_1x_i)\beta_0/\sigma^2 \biggr) \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\Bigl(\frac{n}{\sigma^2}+\frac{1}{2.5^2} \Bigr)\biggl(\beta_0 - \frac{\sum_{i=1}^n(y_i-\beta_1x_i)/\sigma^2}{(n/\sigma^2 + 1/2.5^2)} \biggr)^2 \biggr\} \\ &\propto \mathcal N\biggl( \frac{\sum_{i=1}^n(y_i-\beta_1x_i)/\sigma^2}{(n/\sigma^2 + 1/2.5^2)} , \frac{1}{n/\sigma^2 + 1/2.5^2} \biggr) \end{aligned}\]

Similarly, the posterior of \(\beta_1\) is (\(\sum_{i=1}^n \Rightarrow\sum_i, \prod_{i=1}^n \Rightarrow \prod_i\) for simplicity) \[\begin{aligned} f(\beta_1|-) &\propto \prod_i\exp\biggl\{-\frac{1}{2\sigma^2}(y_i - \beta_0-\beta_1x_i)^2 \biggr\} \cdot \exp\biggl\{-\frac{1}{2\cdot2.5^2}\beta_1^2\biggr\} \\ &\propto \prod_i\exp\biggl\{-\frac{1}{2\sigma^2}(x_i^2\beta_1^2 - 2x_iy_i\beta_1 + 2\beta_0x_i\cdot\beta_1) \biggr\}\cdot\exp\biggl\{-\frac{1}{2\cdot2.5^2}\beta_1^2\biggr\} \\ &= \exp\biggl\{-\frac{1}{2\sigma^2}\Bigl(\sum_ix_i^2\beta_1^2 - 2\beta_1\sum_i(x_iy_i - \beta_0x_i) \Bigr) -\frac{1}{2\cdot2.5^2}\beta_1^2 \biggr\} \\ &= \exp\biggl\{-\frac{1}{2}\biggl(\Bigl(\sum_ix_i^2/\sigma^2+1/2.5^2 \Bigr)\beta_1^2 -2\beta_1\sum_ix_i(y_i-\beta_0)/\sigma^2 \biggr) \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\Bigl(\sum_ix_i^2/\sigma^2+1/2.5^2 \Bigr) \biggl( \beta_1 - \frac{\sum_ix_i(y_i-\beta_0)/\sigma^2}{\sum_ix_i^2/\sigma^2+1/2.5^2}\biggr)^2 \biggr\} \\ &= \mathcal N\biggl(\frac{\sum_ix_i(y_i-\beta_0)/\sigma^2}{\sum_ix_i^2/\sigma^2+1/2.5^2}, \frac{1}{\sum_ix_i^2/\sigma^2+1/2.5^2} \biggr). \end{aligned}\]

The posterior of \(\sigma^2\) is (\(a = 0.001, b = 0.001\))

\[\begin{aligned} f(\sigma^2|-) &\propto \prod_i\frac{1}{(\sigma^2)^{1/2}}\exp\biggl\{-\frac{1}{2\sigma^2}(y_i - \beta_0-\beta_1x_i)^2 \biggr\}\cdot(\sigma^2)^{-a-1}\exp\{-b/\sigma^2\} \\ &\propto \frac{1}{(\sigma^2)^{n/2}}\exp\biggl\{-\frac{1}{2\sigma^2}\sum_i(y_i-\beta_0-\beta_1x_i)^2 \biggr\}(\sigma^2)^{-a-1}\exp\{-b/\sigma^2\} \\ &=(\sigma^2)^{-(a+n/2)-1}\exp\biggl\{ -\Bigl(\frac{1}{2}\sum_i(y_i-\beta_0-\beta_1x_i)^2+b \Bigr)/\sigma^2 \biggr\} \\ &\propto \text{IG}\biggl(a + \frac{n}{2}, \frac{1}{2}\sum_i(y_i-\beta_0-\beta_1x_i)^2+b \biggr). \end{aligned}\]

2) code implementation

We’ve obtained all of the posteriors, so it is possible to make corresponding Gibbs sampler code. The vector X and Y are chosen as follows because the explanatory and response variables are ‘PovPct’ and ‘TeenBrth’ respectively.

The initial values of \(\beta_0, \beta_1\) are generated from \(\mathcal N(0, 2.5^2)\) respectively, and that of \(\sigma^2\) is chosen to a sufficiently small value \(0.001\).

This code will run for \(5000\) times, and the burn-in period is \(2000\). The generated Gibbs samplings will be stored in beta_0s, beta_1s, and sigma2s vectors respectively.

rm(list = ls())
set.seed(2023311161)
library(invgamma) # for inverse Gamma distribution

index <- read.table("index.txt", header = TRUE)
head(index)
##     Location PovPct Brth15to17 Brth18to19 ViolCrime TeenBrth
## 1    Alabama   20.1       31.5       88.7      11.2     54.5
## 2     Alaska    7.1       18.9       73.7       9.1     39.5
## 3    Arizona   16.1       35.0      102.5      10.4     61.2
## 4   Arkansas   14.9       31.6      101.7      10.4     59.9
## 5 California   16.7       22.6       69.1      11.2     41.1
## 6   Colorado    8.8       26.2       79.1       5.8     47.0
X <- index[,'PovPct'];Y <- index[,'TeenBrth']
n <- nrow(index)

# initial values
beta_0 <- rnorm(1, 0, 2.5)
beta_1 <- rnorm(1, 0, 2.5)
sigma2 <- 0.001

iter_num <- 5000
burnin <- 2000

# for sample storage
beta_0s <- rep(0,iter_num)
beta_1s <- rep(0,iter_num)
sigma2s <- rep(0,iter_num)

In previous section 1), we derived the posteriors of each parameter. So, we have the following Gibbs sampler R code. Since the burn-in period is set to \(2000\), we eliminate \(2000\) samples from the front.

# Gibbs sampler code
for(iter in 1:iter_num){
  # sample beta_0, beta_1, sigma2
  beta_0 <- rnorm(1, 
                  (sum(Y - beta_1*X)/sigma2)/(n/sigma2 + 1/2.5^2), 
                  sqrt(1/(n/sigma2 + 1/2.5^2)) 
                  )
  beta_1 <- rnorm(1, 
                  (sum(X*(Y - beta_0))/sigma2)/(sum(X^2)/sigma2 + 1/2.5^2), 
                  sqrt(1/(sum(X^2)/sigma2 + 1/2.5^2))
                  )
  sigma2 <- rinvgamma(1, 
                      shape = 0.001 + n/2, 
                      rate = 1/2*sum((Y - beta_0 - beta_1*X)^2) + 0.001)
  
  # store them
  beta_0s[iter] <- beta_0
  beta_1s[iter] <- beta_1
  sigma2s[iter] <- sigma2
}

# burn-in
beta_0s <- beta_0s[(burnin+1):iter_num]
beta_1s <- beta_1s[(burnin+1):iter_num]
sigma2s <- sigma2s[(burnin+1):iter_num]

The following traceplots show that \(\beta_0,\beta_1,\sigma^2\) oscillate around 5, 2.8, and 90.

# trace plot
ts.plot(beta_0s, main = bquote("Tsplot of" ~ beta[.(0)] ), ylab = bquote(~beta[.(0)] ))

ts.plot(beta_1s, main = bquote("Tsplot of" ~ beta[.(1)] ), ylab = bquote(~beta[.(1)] ))

ts.plot(sigma2s, main = bquote("Tsplot of" ~ sigma^2 ), ylab = bquote(~ sigma^2 ))

(2) Gaussian Mixture Model (Using GMM.txt)

\[\begin{aligned} f(x| \boldsymbol \mu, \boldsymbol \delta) &= \sum_{k=1}^2\delta_k\mathcal N(x; \mu_k,1) \\ f(\mu_k) &\sim \mathcal N(0,1),\quad k=1,2. \\ \boldsymbol \delta := (\delta_1,\delta_2)&\sim \text{Dirichlet}(\alpha_1, \alpha_2),\quad \alpha_k = 1/2. \end{aligned}\]

1) prior of \(z\)

The latent variable \(z\in[1,2]\) is a variable that indicates whether the data \(x\) is in group \(k=1\) or \(k=2\), where the group \(k\) means the \(k\)th normal distribution with mean \(\mu_k\) and variance 1.

The meaning of \(p(x|\boldsymbol{\mu, \delta})\) is that the probability of the generated data \(x\) belonging to the group \(k\) is \(\delta_k\). Then, the vector \(z\) can be viewed as multinomial with probability \(\boldsymbol \delta\); in this case, \(z \sim \text{MN}(n, \boldsymbol \delta) = \text{B}(n, \delta_1)\). So, \[f(z=k|\boldsymbol \delta) = \delta_k,\quad k=1,2.\]

2) posterior of \(z\)

The posterior of all \(z, \boldsymbol{\mu,\delta}\) can be derived as \[\begin{aligned} f(z, \boldsymbol{\mu,\delta}|x) &\propto f(x|z,\boldsymbol{\mu,\delta})f(z,\boldsymbol{\mu,\delta}) \\ &=f(x|z,\boldsymbol{\mu,\delta})f(z|\boldsymbol {\mu,\delta})f(\boldsymbol{\mu,\delta}) \\ &=f(x|z,\boldsymbol{\mu,\delta})f(z|\boldsymbol \delta)f(\boldsymbol{\mu}) f(\boldsymbol{\delta}) . \end{aligned}\]

Regarding posterior of \(z\), when \(z=k\), the \(x \sim \mathcal N(\mu_k, 1)\). That is, the posterior of \(x\) given \(z\) and \(\boldsymbol\mu\) is
\[f(x|z,\boldsymbol\mu) = \mathcal N(\mu_z, 1).\]

We may use this so that we have \[\begin{aligned} f(z=k | x,\boldsymbol{\mu, \delta}) &\propto f(x|z,\boldsymbol{\mu,\delta})f(z|\boldsymbol \delta) \\ &= f(x|z,\boldsymbol{\mu})f(z|\boldsymbol \delta) \\ &= \mathcal N(x;\mu_k,1)\cdot\delta_k, \end{aligned}\] where \(\mathcal N(x; \mu_k,1)\) means pdf of normal distribution with mean \(\mu_k\) and variance 1. After obtaining all posteriors of \(z\) for all \(k=1,2\) and normalizing them, we obtain \[f(z=k|x,\boldsymbol{\mu,\delta}) = \frac{\mathcal N(x;\mu_k,1)\cdot\delta_k}{\sum_{j=1}^2 \mathcal N(x;\mu_j,1)\cdot\delta_j}.\]

3) posterior of \(\mu_k\)

Letting \(\boldsymbol x = (x_1,\dots,x_n)\) and \(\boldsymbol z = (z_1,\dots,z_n)\), the posterior can be written as \[\begin{aligned} f(\boldsymbol{z,\mu, \delta | x}) &\propto f(\boldsymbol{x|z,\mu,\delta})f(\boldsymbol {z|\delta})f(\boldsymbol{\mu}) f(\boldsymbol{\delta}) \\ &\propto f(\boldsymbol{x|z,\mu})f(\boldsymbol {z|\delta})f(\boldsymbol{\mu}) f(\boldsymbol{\delta}) \end{aligned}\] Then, letting \(n_k := \sum_{i=1}^n\mathbf{1}(z_i = k)\), the number of \(z_i\) being \(k\), \[\begin{aligned} f(\mu_k|-) &\propto \prod_{i:z_i=k}f(x_i|z_i=k,\mu_k)\cdot f(\mu_k) \\ &\propto \prod_{i:z_i=k}\exp\biggl\{-\frac{1}{2}(x_i - \mu_k)^2 \biggr\}\cdot\exp\biggl\{-\frac{1}{2}\mu_k^2 \biggr\} \\ &= \exp\biggl\{-\frac{1}{2}\sum_{i:z_i=k}(x_i - \mu_k)^2 -\frac{1}{2}\mu_k^2 \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\bigl(n_k + 1 \bigr)\mu_k^2 -\frac{1}{2}(-2)\mu_k\sum_{i:z_i=k}x_i \biggr\} \\ &= \exp\biggl\{-\frac{1}{2}\bigl(n_k + 1 \bigr)\biggl(\mu_k^2 - 2\mu_k\frac{\sum_{i:z_i=k}}{n_k+1} \biggr) \biggr\} \\ &\propto \exp\biggl\{-\frac{1}{2}\bigl(n_k + 1 \bigr)\biggl(\mu_k - \frac{\sum_{i:z_i=k}}{n_k+1} \biggr)^2 \biggr\} \\ &\propto \mathcal N\biggl(\frac{\sum_{i:z_i=k}}{n_k+1}, \frac{1}{n_k+1} \biggr) \end{aligned}\]

4) posterior of \(\boldsymbol \delta\)

\[\begin{aligned} f(\boldsymbol \delta | -) &\propto f(\boldsymbol {z|\delta} )f(\boldsymbol \delta) \\ &\propto \biggl(\prod_{k=1}^2\delta_k^{n_k}\biggr)\biggl(\prod_{k=1}^2\delta_k^{\alpha_k-1}\biggr) \\ &= \prod_{k=1}^2\delta_k^{\alpha_k+n_k-1} \\ &\propto \text{Dirichlet}(\alpha_1+n_1, \alpha_2+n_2). \end{aligned}\]

5) Gibbs sampler implementation

This code uses ‘gtools’ package for Dirichlet distribution. The ‘ns’ vector is for \(n_k\). The iteration number and burn-in period is same as (1).

rm(list = ls())
library(gtools) # for Dirichlet distribution
set.seed(2023311161)

GMM <- read.table("GMM.txt")
N <- nrow(GMM)
K <- 2

# initialization
z <- sample(1:K, N, replace = TRUE) 
mu <- rnorm(K, 0, 1)
delta <- rep(1/K, K)
alpha <- rep(1/K, K)
ns <- rep(N/K, K) 

iter_num <- 5000
burnin <- 2000

zs <- matrix(0, ncol = iter_num, nrow = N)
mus <- matrix(0,ncol = iter_num, nrow = K)
deltas <- matrix(0,ncol = iter_num, nrow = K)

While sampling \(z\), the elements of the ‘prob’ vector is the numerator of the posterior \(f(z_i=k|x_i, \boldsymbol{\mu,\delta}) \propto \mathcal N(x_i; \mu_k,1)\cdot\delta_k\). We use this vector as the ‘prob’ parameter for ‘sample’ function.

The ‘xsum_k’ corresponds to \(\sum_{i:z_i=k}x_i\), while n_k is \(n_k\).

The ‘alpha’ vector is updated by adding \(n_k\) to the \(k\)th element of ‘alpha’.

# Gibbs sampler code
for(iter in 1:iter_num){
  # sample z
  for(n in 1:N){
    prob <- sapply(1:K, function(k){delta[k]*dnorm(GMM[n,], mu[k], 1) } ) # returns a K-dim vector
    z[n] <- sample(1:K, 1, prob = prob)
  }
  
  # sample mu 
  for(k in 1:K){
    xsum_k <- sum(GMM[z == k, ])
    n_k <- sum(z == k)
    mu[k] <- rnorm(1, xsum_k/(n_k + 1), sqrt(1/(n_k + 1)) )
    ns[k] <- n_k
  }
  
  # sample delta
  alpha <- alpha + ns
  delta <- rdirichlet(1, alpha)
  
  # store the iter-th samples
  
  zs[, iter] <- z
  mus[, iter] <- mu
  deltas[, iter] <- delta
  
}

# burn-in
zs <- zs[, (burnin+1):iter_num]
mus <- mus[, (burnin+1):iter_num]
deltas <- deltas[, (burnin+1):iter_num]

The following traceplot of \(\mu_1, \mu_2\) shows that they oscillate around -1.8 and 1.8 respectively.

ts.plot(mus[1,], main = bquote("Tsplot of" ~ mu[.(1)] ), ylab = bquote(~ mu[.(1)] ) )

ts.plot(mus[2,], main = bquote("Tsplot of" ~ mu[.(2)] ), ylab = bquote(~ mu[.(2)] ) )

The following traceplot shows \(\delta_1, \delta_2\) oscillate around 0.444 and 0.556 respectively.

for(k in 1:K){
  ts.plot(deltas[k,], main = bquote("Tsplot of" ~ delta[.(k)] ), ylab = bquote(~ delta[.(k)]) )
}

2. MH within Gibbs (Using drv.txt)

Rasch model

\[\begin{aligned} \text{logit}\bigl(\mathbb P(Y_{i,j} = 1|\theta_i, \beta_j)\bigr) &= \theta_i-\beta_j \\ \mathbb P(Y_{i,j} = 1|\theta_i, \beta_j) &= \frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \\ f(\boldsymbol{y|\theta,\beta}) := L(\boldsymbol{\theta,\beta}) &= \prod_{i=1}^n\prod_{j=1}^p\biggl[\frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{y_{i,j}}\biggl[ \frac{1}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{1-y_{i,j}} \\ f(\boldsymbol{\theta,\beta},\sigma^2) &\propto L(\boldsymbol{\theta,\beta})f(\boldsymbol \beta)f(\boldsymbol \theta | \sigma^2)f(\sigma^2) \end{aligned}\]

Prior Setting \[\begin{aligned} \beta_j &\sim \mathcal N(0, 2.5^2) \\ (\theta_i|\sigma^2) &\sim \mathcal N(0,\sigma^2) \\ \sigma^2 &\sim \text{IG}(a,b),\quad a=b=0.001 \end{aligned}\]

1) \(\boldsymbol\theta\) by MH

The target density of \(\theta_i\) is \[\begin{aligned} f(\theta_i|-) &\propto L(\theta_i, \boldsymbol\beta)f(\theta_i|\sigma^2) \\ &=f(\boldsymbol y |\theta_i, \boldsymbol \beta)f(\theta_i | \theta^2) \\ &\propto \prod_{j=1}^p\biggl[\frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{y_{i,j}}\biggl[ \frac{1}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{1-y_{i,j}}\cdot \exp\biggl\{-\frac{1}{2\sigma^2}\theta_i^2 \biggr\} \end{aligned}\] It is difficult to determine the closed-form distribution of the posterior of \(\theta_i\), so we use a proposal density, say \(g(\theta_i^*|\theta_i^{(t)})\) (\(t\)th iteration) to apply Metropolis Hasting algorithm for generating \(\theta_i\).

Let us set the proposal density as a normal distribution : \[g(\theta_i^*|\theta_i^{(t)}) := \mathcal N(\theta_i^*; \theta_i^{(t)} , \tau_{\theta}^2).\] Since the following equation holds \(g(\theta_i^*|\theta_i^{(t)}) =g(\theta_i^{(t)}|\theta_i^*)\), the acceptance ratio can be written as follows: \[\begin{aligned} R_{\theta_i} &:= \frac{f(\theta_i^*|-)/g(\theta_i^*|\theta_i^{(t)}) }{f(\theta_i^{(t)}|-)/g(\theta_i^{(t)}|\theta_i^{*}) } \\ &= \frac{f(\theta_i^*|-)}{f(\theta_i^{(t)}|-)} \\ &= \frac{f(\boldsymbol y | \theta_i^{*},\boldsymbol \beta)f(\theta_i^*|\sigma^2)}{f(\boldsymbol y | \theta_i^{(t)},\boldsymbol \beta)f(\theta_i^{(t)}|\sigma^2)} \\ &= \frac{f(\boldsymbol y | \theta_i^{*},\boldsymbol \beta) \mathcal N(\theta_i^*; 0,\sigma^2)}{f(\boldsymbol y | \theta_i^{(t)},\boldsymbol \beta) \mathcal N(\theta_i^{(t)};0,\sigma^2)} \end{aligned}\]

Recall that \(f(\boldsymbol y | \theta_i^{*},\boldsymbol \beta)\) is a form of likelihood function. We can also express this as an exponential of log-likelihood (in order to compute easily and to escape from calculating \(\pm\infty\)). Letting \(l(\boldsymbol{\theta,\beta}) := \log L(\boldsymbol{\theta,\beta})\), \[R_{\theta_i} = \frac{f(\boldsymbol y | \theta_i^{*},\boldsymbol \beta) \mathcal N(\theta_i^*; 0,\sigma^2)}{f(\boldsymbol y | \theta_i^{(t)},\boldsymbol \beta) \mathcal N(\theta_i^{(t)};0,\sigma^2)} = \exp\Bigl\{l(\theta_i^*,\boldsymbol\beta) - l(\theta_i^{(t)},\boldsymbol\beta) \Bigr\}\cdot\frac{\mathcal N(\theta_i^*; 0,\sigma^2)}{\mathcal N(\theta_i^{(t)}; 0,\sigma^2)}\]

2) \(\boldsymbol\beta\) by MH

The target density (posterior) of \(\beta_j\) is \[\begin{aligned} f(\beta_j|-) &\propto f(\boldsymbol{y|\theta}, \beta_j)f(\beta_j) \\ &\propto \prod_{i=1}^n\biggl[\frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{y_{i,j}}\biggl[ \frac{1}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{1-y_{i,j}} \exp\biggl\{-\frac{1}{2\cdot2.5^2\beta_j^2}\biggr\} \end{aligned}\]

This density is difficult to determine the closed-form as well, so we use another proposal density \(g(\beta_j^*|\beta_j^{(t)})\) to apply Metropolis Hasting algorithm. Letting \(g(\beta_j^*|\beta_j^{(t)}) := \mathcal N(\beta_j^*;\beta_j^{(t)},\tau_{\beta}^2)\), we have the following acceptance ratio: \[\begin{aligned} R_{\beta_j} &:= \frac{f(\beta_j^*|-)/g(\beta_j^*|\beta_j^{(t)})}{f(\beta_j^{(t)}|-)/g(\beta_j^{(t)}|\beta_j^*)} \\ &=\frac{f(\beta_j^*|-)}{f(\beta_j^{(t)}|-)} \\ &=\frac{f(\boldsymbol{y|\theta},\beta_j^*) f(\beta_j^*)}{f(\boldsymbol{y|\theta},\beta_j^{(t)}) f(\beta_j^{(t)})} \\ &=\exp\Bigl\{l(\boldsymbol\theta, \beta_j^*) - l(\boldsymbol\theta,\beta_j^{(t)}) \Bigr\} \cdot \frac{\mathcal N(\beta_j^*;0,2.5^2)}{\mathcal N(\beta_j^{(t)}; 0, 2.5^2)}. \end{aligned}\]

3) \(\sigma^2\) by Gibbs

The posterior of \(\sigma^2\) is \[\begin{aligned} f(\sigma^2|-) &\propto f(\boldsymbol\theta | \sigma^2)f(\sigma^2) \\ &= \biggl[\prod_{i=1}^n\frac{1}{(\sigma^2)^{1/2}}\exp\biggl\{-\frac{1}{2\sigma^2}\theta_i^2\biggr\} \biggr]\cdot(\sigma^2)^{-a-1}\exp\{-b/\sigma^2\} \\ &=(\sigma^2)^{-a-n/2-1}\exp\biggl\{-\Bigl(\frac{1}{2}\sum_{i=1}^n\theta_i^2 +b\Bigr)/\sigma^2 \biggr\} \\ &\propto \text{IG}\Bigl(a+n/2, b+\sum_{i=1}^n\theta_i^2 \Bigr). \end{aligned}\]

Since the posterior of \(\sigma^2\) can be easily obtained (\(\text{Inv-}\Gamma\)), we may generate \(\sigma^2\) by Gibbs sampler.

4) Rcpp code for MH

Let us first introduce a function for log-likelihood. For given \(y_{i,j}\) and \(\theta_i, \beta_j\), the likelihood (pdf) is \(\biggl[\frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{y_{i,j}}\biggl[ \frac{1}{1 + \exp\{\theta_i - \beta_j\}} \biggr]^{1-y_{i,j}}\). Shifting to logarithm, we have \[l(\theta_i,\beta_j) = y_{i,j}\log\biggl( \frac{\exp\{\theta_i - \beta_j\}}{1 + \exp\{\theta_i - \beta_j\}} \biggr) + (1-y_{i,j}) \log\biggl( \frac{1}{1 + \exp\{\theta_i - \beta_j\}} \biggr)\] Repeat this for all \(i=1,...,n\) and \(j=1,...,p\) and add all the log-likelihood values.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double loglikelihood(NumericVector thetas, NumericVector betas, NumericMatrix Y){
  int n = Y.nrow();
  int p = Y.ncol();
  double logL = 0.0;
  
  for(int i = 0;i<n;i++){
    for(int j = 0; j<p; j++){
      double prob = exp(thetas[i] - betas[j])/(1 + exp(thetas[i] - betas[j]) );
      logL += Y(i,j)*log(prob) + (1 - Y(i,j))*log(1-prob);
    }
  }
  
  return logL;
}

The next step is Metropolis Hastings for \(\boldsymbol\theta\). Empirically, the standard deviation of the proposal density is set to 0.5 so as to make the (empirical) acceptance ratios of all \(\theta_i\)’s around between 0.2 and 0.5 (the corresponding R code will be shown later).

The given algorithm is same as the \(\boldsymbol \theta\) by MH section. Whether the proposal is accepted is determined by checking whether \(u\sim\text{Unif}(0,1)\) is less tan ‘R_theta’, the acceptance ratio for each \(\theta_i\).

// [[Rcpp::export]]
NumericVector MH_theta(NumericVector thetas_cur, NumericVector betas, double sigma2, NumericMatrix Y){
  int n = thetas_cur.length();
  
  NumericVector thetas_new = clone(thetas_cur); //  do not change the original theta_cur
  
  for(int i = 0; i<n ; i++){
    double proposal = R::rnorm(thetas_cur[i], 0.5); 
    thetas_new[i] = proposal;
    
    double ll_cur = loglikelihood(thetas_cur, betas, Y);
    double ll_new = loglikelihood(thetas_new, betas, Y);
    
    // MH acceptance ratio
    double R_theta = exp(ll_new - ll_cur)*(R::dnorm( thetas_new[i], 0, sqrt(sigma2), 0)/R::dnorm(thetas_cur[i] , 0, sqrt(sigma2), 0) );
    double u = R::runif(0,1);
    
    if(u < R_theta){
      thetas_new[i] = proposal;
    } else{
      thetas_new[i] = thetas_cur[i];
    }
    
  }
  
  return thetas_new;
}

The next step is Metropolis Hastings for \(\boldsymbol\beta\). Empirically, the standard deviation of the proposal density is set to \(0.45\) so as to make the (empirical) acceptance ratios of all \(\beta_j\)’s around between 0.2 and 0.5 (the corresponding R code will be shown later).

The given algorithm is same as the \(\boldsymbol \beta\) by MH section. Whether the proposal is accepted is determined by checking whether \(u\sim\text{Unif}(0,1)\) is less tan ‘R_beta’, the acceptance ratio for each \(\beta_j\).

// [[Rcpp::export]]
NumericVector MH_beta(NumericVector thetas, NumericVector betas_cur, double sigma2, NumericMatrix Y){
  int p = betas_cur.length();
  
  NumericVector betas_new = clone(betas_cur); 
  
  for(int j = 0; j<p ; j++){
    double proposal = R::rnorm(betas_cur[j], 0.45); // from 0.4
    betas_new[j] = proposal;
    
    double ll_cur = loglikelihood(thetas, betas_cur, Y);
    double ll_new = loglikelihood(thetas, betas_new, Y);
    
    // MH acceptance ratio
    double R_beta = exp(ll_new - ll_cur)*(R::dnorm( betas_new[j], 0, 2.5, 0)/R::dnorm(betas_cur[j] , 0, 2.5, 0) );
    double u = R::runif(0,1);
    
    if(u < R_beta){
      betas_new[j] = proposal;
    } else{
      betas_new[j] = betas_cur[j];
    }
    
  }
  
  return betas_new;
}

The above Rcpp file is stored in “MC_HW2-2_kohanjun.cpp” file.

rm(list = ls())
library(Rcpp)
library(invgamma)
set.seed(2023311161)

drv <- read.table("drv.txt")
Y <- as.matrix(drv)
n <- nrow(Y); p <- ncol(Y)

sourceCpp("MC_HW2-2_kohanjun.cpp") # for MH

5) Gibbs sampler code

Insert each parameter into ‘rinvgamma’ function and return the value.

sample_sigma2 <- function(thetas){
  n <- length(thetas)
  shape <- 0.001 + n/2
  rate <- 0.001 + 1/2*sum(thetas^2)
  
  return(rinvgamma(1, shape=shape, rate=rate))
}

6) initial values

Total iteration number is set to 3000 (relatively low), because the whole computation is quite heavy despite leveraging rcpp. The ‘acc_thetas’ and ‘acc_betas’ matrices are for checking the empirical acceptance ratios for each \(\theta_i, \beta_j\).

# initial values
thetas_0 <- rnorm(n, 0, 0.001)
betas_0 <- rnorm(p, 0, 2.5)
sigma2_0 <- 0.001

iter_num <- 4000 # 3 minutes for 100
burnin <- 1000

thetas <- matrix(0, nrow = n, ncol = iter_num)
betas <-  matrix(0, nrow = p, ncol = iter_num)
sigma2 <- rep(0, iter_num)

# for acceptance ratios
acc_thetas <- matrix(0, nrow = n, ncol = iter_num)
acc_betas <-  matrix(0, nrow = p, ncol = iter_num)

7) MH-within-Gibbs

Based on this, the following code conducts MH-within-Gibbs

#MH within Gibbs
for(iter in 1:iter_num){
  thetas_i <- MH_theta(thetas_0, betas_0, sigma2_0, Y)
  betas_i <- MH_beta(thetas_i, betas_0, sigma2_0, Y)
  sigma2_i <- sample_sigma2(thetas_i)
  
  thetas[,iter] <- thetas_i
  betas[,iter] <- betas_i
  sigma2[iter] <- sigma2_i
  
  #for acceptance ratios
  acc_thetas[,iter] <- (thetas_0 != thetas_i)
  acc_betas[,iter] <- (betas_0 != betas_i)
  
  thetas_0 <- thetas_i
  betas_0 <- betas_i
  sigma2_0 <- sigma2_i
}
# burn in
thetas <- thetas[,(burnin+1):iter_num]
betas <- betas[,(burnin+1):iter_num]
sigma2 <- sigma2[(burnin+1):iter_num]

acc_thetas <- acc_thetas[,(burnin+1):iter_num]
acc_betas <- acc_betas[,(burnin+1):iter_num]

The \(i\)th and \(j\)th elements of ‘acc_r_thetas’ and ‘acc_r_betas’ mean their empirical acceptance ratio. Regarding \(\theta_i\)s, we may increase the variance of the proposal, but this only increases the time of being stuck into initial state. So the variance of the proporsal is set to the value 0.5, although the acceptance rate of \(\theta_i\) is slightly higher than 0.5.

# acceptance ratio
acc_r_thetas <- rowMeans(acc_thetas) # vector of length n
acc_r_betas <- rowMeans(acc_betas) # vector of length p

summary(acc_r_thetas)  
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4127  0.4348  0.4407  0.4419  0.4480  0.5067
summary(acc_r_betas) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2710  0.2811  0.2952  0.2980  0.3093  0.3613

8) results

The following shows the trace plots of all \(\theta_i\)s. All figures are aligned horizontally.

par(mar = c(1, 1, 1, 1), mfrow = c(8,6))

for(k in 1:n){
  ts.plot(thetas[k,], ylab = bquote(~ theta[.(k)]))
}

You may check each of the posterior mean of \(\theta_i\) in the following matrix. The \((k,l)\) element of the matrix implies \(i = 10\cdot(k-1)+l\).

thetas_mean <- rowMeans(thetas) # length n
thetas_mean <- c(thetas_mean, NA, NA)
matrix(round(thetas_mean, digits = 3), ncol = 10, byrow = TRUE)
##         [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
##  [1,] -0.667 -0.256 -0.384 -0.515 -0.479 -0.133 -0.108 -0.496 -0.135 -0.143
##  [2,] -0.104 -0.678 -0.669 -0.136 -0.131 -0.254 -0.391 -0.560 -0.077 -0.369
##  [3,] -0.019 -0.664 -0.082 -0.525 -0.218 -0.673 -0.281 -0.148  0.551  0.047
##  [4,] -0.238 -0.267 -0.098 -0.163 -0.365 -0.123 -0.006 -0.265 -0.273 -0.652
##  [5,]  0.173  0.402 -0.132 -0.142 -0.391 -0.292 -0.129 -0.372 -0.400 -0.135
##  [6,] -0.247 -0.230 -0.496 -0.275 -0.435 -0.508 -0.361 -0.675 -0.080 -0.250
##  [7,] -0.698 -0.536 -0.116  0.013 -0.290  0.022 -0.143 -0.158 -0.394 -0.693
##  [8,] -0.122 -0.431  0.276  0.556  0.302  0.444 -0.149 -0.244 -0.826 -0.698
##  [9,] -0.114  0.149 -0.003 -0.257 -0.118 -0.138 -0.138 -0.127 -0.390 -0.273
## [10,]  0.427 -0.619 -0.002 -0.130 -0.262 -0.373  0.164 -0.967 -0.213 -0.265
## [11,] -0.095 -0.102 -0.155 -0.079 -0.117  0.065 -0.150 -0.284 -0.671 -0.698
## [12,] -0.098 -0.418  0.064 -0.401 -0.132 -0.505 -0.150 -0.101 -0.134 -0.410
## [13,]  0.180  0.007  0.051 -0.115 -0.499 -0.275 -0.262  0.776 -0.167 -0.107
## [14,] -0.131  0.123 -0.119 -0.255 -0.242  0.346 -0.428 -0.249  0.164 -0.233
## [15,]  0.031 -0.226 -0.386 -0.249 -0.821 -0.195  0.000 -0.127 -0.213 -0.008
## [16,] -0.113 -0.020  0.159  0.028  0.190 -0.409 -0.554  0.917  0.012  0.432
## [17,] -0.607 -0.411 -0.495  1.395 -0.134  0.934 -0.248  0.445  0.403 -0.085
## [18,]  0.886  0.712  0.164  0.758  0.317 -0.683 -0.526  0.431 -0.488  0.002
## [19,] -0.283  0.009  0.155 -0.141 -0.826  0.704 -0.383 -0.389  0.201  0.279
## [20,] -0.128  1.278  1.203  0.544 -0.657  1.254 -0.628 -0.529 -0.543 -0.418
## [21,] -0.127 -0.406 -0.397 -0.142  0.774  0.051 -0.259 -0.253 -0.337  0.721
## [22,]  1.261  0.892  0.153  0.607 -0.535 -0.145 -0.651 -0.571 -0.546 -0.117
## [23,] -0.548 -0.925  0.457 -0.661 -0.373  0.588 -0.094  0.010 -0.253  0.443
## [24,]  0.587  0.086  0.073 -0.165 -0.225  0.735 -0.094  0.625  0.698  0.160
## [25,] -0.163  0.126  0.038 -0.010 -0.307  0.224 -1.132  0.253 -0.103  0.216
## [26,] -0.298 -0.092 -0.119  1.083  1.193 -0.953 -0.517 -0.126 -0.158  0.018
## [27,] -0.099  0.047 -1.256  0.276 -0.227 -0.116 -0.009 -0.270 -0.529 -0.522
## [28,]  0.037 -0.247 -0.145 -0.522 -0.098  0.321 -0.527  0.195 -0.381 -0.675
## [29,] -1.064 -0.395  0.723  0.399  1.439  0.571 -0.432 -0.411 -0.376 -0.563
## [30,]  0.760  1.653 -0.277  0.027  0.307 -0.004 -0.498 -0.052  1.626  0.682
## [31,]  0.013 -0.819 -0.502  1.796  0.280  0.705  0.458 -0.109  0.071  0.606
## [32,] -0.014  0.162  0.029 -0.707  0.978  0.167  0.590 -0.703 -0.138 -0.570
## [33,]  0.571  0.176  0.068  1.068 -0.642 -0.102  0.003  1.663  1.384  1.103
## [34,]  1.649  0.566  1.040  0.544 -0.342 -0.290  1.472 -0.284 -0.014 -0.044
## [35,] -0.144 -0.142  0.295 -0.656  0.024 -0.811  1.597 -0.121 -0.243 -0.165
## [36,] -0.461  1.055 -0.293  1.230 -1.156  1.642  0.274  0.133  0.386 -0.276
## [37,]  0.026 -0.247 -0.831  0.128  0.453 -0.263  0.279  0.384  1.862  0.053
## [38,]  0.180 -0.400 -0.142  1.051  0.476  0.689  0.593  0.724  0.622  0.699
## [39,]  0.046 -0.427  0.286 -0.287 -0.819 -0.251  1.014  0.025 -0.010  0.760
## [40,] -0.154  0.280  0.276 -0.015  0.013 -0.965  0.062 -0.679  0.197 -0.552
## [41,]  0.276 -0.260  0.187  0.537  0.255  1.132  1.816  1.616  1.441  0.333
## [42,] -0.668 -0.274  0.577  0.302  0.559  0.127  0.434  0.623     NA     NA

The following shows the trace plots of all \(\beta_j\)s. All figures are aligned horizontally as well.

par(mar = c(1, 1, 1, 1), mfrow = c(6,4))
for(b in 1:p){
  ts.plot(betas[b,], ylab = bquote(~ beta[.(b)]))
}

The values of posterior mean of \(\beta_j\) are in the following matrix. In this case, (k,l) element implies \(j = 10\cdot(k-1) + l\) as well.

betas_mean <- rowMeans(betas) # length p
betas_mean <- c(betas_mean, rep(NA, 6))
matrix(round(betas_mean, digits = 3), ncol = 10, byrow = TRUE)
##        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
## [1,] -0.859 -0.832 -0.545  0.160 -0.631 -0.642 -0.179 -0.061 -1.953  0.558
## [2,]  1.100 -1.232 -1.588  0.949  1.608 -0.181 -1.362  0.783  1.083 -1.222
## [3,] -1.346  1.239  1.399 -0.401     NA     NA     NA     NA     NA     NA

The trace plot of \(\sigma^2\) is given as follows.

par(mfrow = c(1,1))
ts.plot(sigma2, ylab = bquote(~ sigma^2) )